18 research outputs found

    Balanced Training of Energy-Based Models with Adaptive Flow Sampling

    Full text link
    Energy-based models (EBMs) are versatile density estimation models that directly parameterize an unnormalized log density. Although very flexible, EBMs lack a specified normalization constant of the model, making the likelihood of the model computationally intractable. Several approximate samplers and variational inference techniques have been proposed to estimate the likelihood gradients for training. These techniques have shown promising results in generating samples, but little attention has been paid to the statistical accuracy of the estimated density, such as determining the relative importance of different classes in a dataset. In this work, we propose a new maximum likelihood training algorithm for EBMs that uses a different type of generative model, normalizing flows (NF), which have recently been proposed to facilitate sampling. Our method fits an NF to an EBM during training so that an NF-assisted sampling scheme provides an accurate gradient for the EBMs at all times, ultimately leading to a fast sampler for generating new data

    A Deterministic and Generalized Framework for Unsupervised Learning with Restricted Boltzmann Machines

    Full text link
    Restricted Boltzmann machines (RBMs) are energy-based neural-networks which are commonly used as the building blocks for deep architectures neural architectures. In this work, we derive a deterministic framework for the training, evaluation, and use of RBMs based upon the Thouless-Anderson-Palmer (TAP) mean-field approximation of widely-connected systems with weak interactions coming from spin-glass theory. While the TAP approach has been extensively studied for fully-visible binary spin systems, our construction is generalized to latent-variable models, as well as to arbitrarily distributed real-valued spin systems with bounded support. In our numerical experiments, we demonstrate the effective deterministic training of our proposed models and are able to show interesting features of unsupervised learning which could not be directly observed with sampling. Additionally, we demonstrate how to utilize our TAP-based framework for leveraging trained RBMs as joint priors in denoising problems

    Inferring Sparsity: Compressed Sensing using Generalized Restricted Boltzmann Machines

    Get PDF
    In this work, we consider compressed sensing reconstruction from MM measurements of KK-sparse structured signals which do not possess a writable correlation model. Assuming that a generative statistical model, such as a Boltzmann machine, can be trained in an unsupervised manner on example signals, we demonstrate how this signal model can be used within a Bayesian framework of signal reconstruction. By deriving a message-passing inference for general distribution restricted Boltzmann machines, we are able to integrate these inferred signal models into approximate message passing for compressed sensing reconstruction. Finally, we show for the MNIST dataset that this approach can be very effective, even for M<KM < K.Comment: IEEE Information Theory Workshop, 201

    Neural networks: from the perceptron to deep nets

    Full text link
    Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 80s, starting from the simple models of Hopfield's associative memory and the single-neuron perceptron classifier. Assuming data is generated by a teacher model, asymptotic generalisation predictions were originally derived using the replica method and the online learning dynamics has been described in the large system limit. In this chapter, we review the key original ideas of this literature along with their heritage in the ongoing quest to understand the efficiency of modern deep learning algorithms. One goal of current and future research is to characterize the bias of the learning algorithms toward well-generalising minima in a complex overparametrized loss landscapes with many solutions perfectly interpolating the training data. Works on perceptrons, two-layer committee machines and kernel-like learning machines shed light on these benefits of overparametrization. Another goal is to understand the advantage of depth while models now commonly feature tens or hundreds of layers. If replica computations apparently fall short in describing general deep neural networks learning, studies of simplified linear or untrained models, as well as the derivation of scaling laws provide the first elements of answers.Comment: Contribution to the book Spin Glass Theory and Far Beyond: Replica Symmetry Breaking after 40 Years; Chap. 2

    Modern applications of machine learning in quantum sciences

    Full text link
    In these Lecture Notes, we provide a comprehensive introduction to the most recent advances in the application of machine learning methods in quantum sciences. We cover the use of deep learning and kernel methods in supervised, unsupervised, and reinforcement learning algorithms for phase classification, representation of many-body quantum states, quantum feedback control, and quantum circuits optimization. Moreover, we introduce and discuss more specialized topics such as differentiable programming, generative models, statistical approach to machine learning, and quantum machine learning.Comment: 268 pages, 87 figures. Comments and feedback are very welcome. Figures and tex files are available at https://github.com/Shmoo137/Lecture-Note

    Éléments de compréhension des réseaux de neurones pour l’apprentissage automatique par méthodes de champ moyen

    No full text
    Machine learning algorithms relying on deep new networks recently allowed a great leap forward in artificial intelligence. Despite the popularity of their applications, the efficiency of these algorithms remains largely unexplained from a theoretical point of view. The mathematical descriptions of learning problems involves very large collections of interacting random variables, difficult to handle analytically as well as numerically. This complexity is precisely the object of study of statistical physics. Its mission, originally pointed towards natural systems, is to understand how macroscopic behaviors arise from microscopic laws. In this thesis we propose to take advantage of the recent progress in mean-field methods from statistical physics to derive relevant approximations in this context. We exploit the equivalences and complementarities of message passing algorithms, high-temperature expansions and the replica method. Following this strategy we make practical contributions for the unsupervised learning of Boltzmann machines. We also make theoretical contributions considering the teacher-student paradigm to model supervised learning problems. We develop a framework to characterize the evolution of information during training in these model. Additionally, we propose a research direction to generalize the analysis of Bayesian learning in shallow neural networks to their deep counterparts.Les algorithmes d’apprentissage automatique utilisant des réseaux de neurones profonds ont récemment révolutionné l'intelligence artificielle. Malgré l'engouement suscité par leurs diverses applications, les excellentes performances de ces algorithmes demeurent largement inexpliquées sur le plan théorique. Ces problèmes d'apprentissage sont décrits mathématiquement par de très grands ensembles de variables en interaction, difficiles à manipuler aussi bien analytiquement que numériquement. Cette multitude est précisément le champ d'étude de la physique statistique qui s'attelle à comprendre, originellement dans les systèmes naturels, comment rendre compte des comportements macroscopiques à partir de cette complexité microscopique. Dans cette thèse nous nous proposons de mettre à profit les progrès récents des méthodes de champ moyen de la physique statistique des systèmes désordonnés pour dériver des approximations pertinentes dans ce contexte. Nous nous appuyons sur les équivalences et les complémentarités entre les algorithmes de passage de message, les développements haute température et la méthode des répliques. Cette stratégie nous mène d'une part à des contributions pratiques pour l'apprentissage non supervisé des machines de Boltzmann. Elle nous permet d'autre part de contribuer à des réflexions théoriques en considérant le paradigme du professeur-étudiant pour modéliser des situations d'apprentissage. Nous développons une méthode pour caractériser dans ces modèles l'évolution de l'information au cours de l’entraînement, et nous proposons une direction de recherche afin de généraliser l'étude de l'apprentissage bayésien des réseaux de neurones à une couche aux réseaux de neurones profonds

    Optimizing Markov Chain Monte Carlo Convergence with Normalizing Flows and Gibbs Sampling

    No full text
    International audienceGenerative models have started to integrate into the scientific computing toolkit. One notable instance of this integration is the utilization of normalizing flows (NF) in the development of sampling and variational inference algorithms. This work introduces a novel algorithm, GflowMC, which relies on a Metropolis-within-Gibbs framework within the latent space of NFs. This approach addresses the challenge of vanishing acceptance probabilities often encountered when using NF-generated independent proposals, while retaining non-local updates, enhancing its suitability for sampling multi-modal distributions. We assess GflowMC's performance concentrating on the ϕ 4 model from statistical mechanics. Our results demonstrate that by identifying an optimal size for partial updates, convergence of the Markov Chain Monte Carlo (MCMC) can be achieved faster than with full updates. Additionally, we explore the adaptability of GflowMC for biasing proposals towards increasing the update frequency of critical coordinates, such as coordinates highly correlated to mode switching in multi-modal targets
    corecore